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Abstract — A rateless code — i.e., a rate-compatible family of 
codes — has the property that codewords of the higher rate codes 
are prefixes of those of the lower rate ones. A perfect family 
of such codes is one in which each of the codes in the family is 
capacity-achieving. We show by construction that perfect rateless 
codes with low-complexity decoding algorithms exist for additive 
white Gaussian noise channels. Our construction involves the 
use of layered encoding and successive decoding, together with 
a repetition and dithering technique. As an illustration of our 
framework, we design a practical three-rate code family. We 
further construct rich sets of near-perfect rateless codes within 
our architecture that require either significantly fewer layers or 
lower complexity than their perfect counterparts. Variations of 
the basic construction are also discussed. 

Index Terms — Incremental redundancy, rate-compatible punc- 
tured codes, hybrid ARQ (H-ARQ), static broadcasting. 



I. Introduction 

THE design of effective "rateless" codes has received re- 
newed strong interest in the coding community, motivated 
by a number of emerging applications. Such codes have a 
long history, and have gone by various names over time, 
among them incremental redundancy codes, rate-compatible 
punctured codes, hybrid ARQ type II codes, and static broad- 
cast codes [3], [4], [9]-[12], [14], [18], [19], [24]. This paper 
focuses on the design of such codes for average power limited 
additive white Gaussian noise (AWGN) channels. Specifically, 
we develop techniques for mapping standard good single-rate 
codes for the AWGN channel into good rateless codes. 

From a purely information theoretic perspective the problem 
of rateless transmission is well understood; see Shulman [23] 
for a comprehensive treatment. Indeed, for channels having 
one maximizing input distribution, a codebook drawn indepen- 
dently and identically distributed (i.i.d.) at random from this 
distribution will be good with high probability, when truncated 
to (a finite number of) different lengths. Phrased differently, 
in such cases random codes are rateless codes. 

Constructing good codes that also have computationally 
efficient encoders and decoders requires more effort. A re- 
markable example of such codes for erasure channels are the 
recent Raptor codes of Shokrollahi [22], which build on the 
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LT codes of Luby [2], [13]. An erasure channel model (for 
packets) is most appropriate for rateless coding architectures 
anchored at the application layer, where there is little or no 
access to the physical layer. 

Apart from erasure channels, there is a growing interest in 
exploiting rateless codes closer to the physical layer, where 
AWGN models are more natural; see, e.g., [25] and the 
references therein. Surprisingly little is known about what is 
possible in this realm. Recent work [8], [17] applies Raptor 
codes to binary-input AWGN channels (among others), where 
it is shown that no degree distribution allows Raptor codes to 
approach capacity simultaneously at different signal to noise 
ratios (SNRs). Another line of work is based on puncturing of 
low-rate capacity-approaching codes such as turbo and LDPC 
codes [1], [9], [15], [18], [19], [25]. When iterative decoding 
is used, however, a balance must be struck between the 
performance at different rates. That is, improving performance 
at one rate comes at the expense of the performance at other 
rates. Beyond this issue, binary codes themselves may be 
"nearly" capacity achieving only at low SNR. 

In this paper, motivated by a host of emerging wireless 
applications, we work at the physical layer with an associated 
AWGN channel model, rather than with an erasure model. 
And as such, our focus is on that part of the network where 
traditional hybrid ARQ research has been aimed. The rateless 
codes that result are efficient, practical, and can operate at 
rates of multiple b/s/Hz. 

We show that the successful techniques employed to 
construct low-complexity codes for the standard AWGN 
channel — such as those arising out of turbo and low-density 
parity check (LDPC) codes — can be leveraged to construct 
rateless codes. Specifically, we develop an architecture in 
which a single codebook designed to operate at a single SNR is 
used in a straightforward manner to build a rateless codebook 
that operates at many SNRs. 

The encoding in our architecture exploits three key ingredi- 
ents: layering, dithering, and repetition. By layering, we mean 
the creation of a code by a linear combination of subcodes. 
By dithering we mean the use of multiplicative pre- and post- 
processing by known sequences. Finally, by repetition, we 
mean the use of simple linear redundancy in which each 
copy has a different complex gain. We show that with the 
appropriate combination of these ingredients, if the base codes 
are capacity-achieving, so will be the resulting rateless code. 

In addition to achieving capacity in our architecture, we 
seek to ensure that if the base code can be decoded with low 
complexity, so can the rateless code. As we will see, this 
is accomplished by imposing the constraint that the layered 
encoding be successively decodable — i.e., that the layers can 
be decoded one at a time, treating as yet undecoded layers as 
noise. 
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Hence, our main result is the construction of capacity- 
achieving, low-complexity rateless codes, i.e., rateless codes 
constructed from layering, dithering, and repetition, that are 
successively decodable. 

The paper is organized as follows. In Section[II]we introduce 
the channel and system model. In Section [III] we motivate 
and illustrate our construction with a simple special-case 
example. In Section [IV] we develop our general construction 
and show that within it exist perfect rateless codes for at least 
some ranges of interest, and in Section [V] we develop and 
analyze specific instances of our codes generated numerically. 
In Section [VI] we show that within the constraints of our 
construction rateless codes for any target ceiling and range 
can be constructed that are arbitrarily close to perfect in an 
appropriate sense. In Section [VlIl we describe some potentially 
useful variations on our basic construction, and their key 
properties. Finally, Section IVIIII contains some concluding 
remarks. 

II. Channel and System Model 

The codes we construct are designed for a complex AWGN 
channel 

y m = ax m + z m , to = 1,2,..., (1) 

where a is a channel gainfl x m is a vector of of N input 
symbols, y m is the vector of channel output symbols, and 
z m is a noise vector of iV i.i.d. complex, circularly-symmetric 
Gaussian random variables of variance a 2 , independent across 
blocks to = 1,2,.... The channel input is limited to average 
power P per symbol. In our model, the channel gain a and 
noise variance a 2 are known a priori at the receiver but not at 
the transmitter]! 

The block length TV has no important role in the analysis 
that follows. It is, however, the block length of the base code 
used in the rateless construction. As the base code performance 
controls the overall code performance, to approach channel 
capacity N must be large. 

The encoder transmits a message w by generating a se- 
quence of code blocks (incremental redundancy blocks) Xi(w), 
X2(w), The receiver accumulates sufficiently many re- 
ceived blocks yi, y2, ...to recover w. The channel gain a 
may be viewed as a variable parameter in the model; more 
incremental redundancy is needed to recover w when a is 
small than when a is large. 

An important feature of this model is that the receiver 
always starts receiving blocks from index to = 1. It does not 
receive an arbitrary subsequence of blocks, as might be the 
case if one were modeling a broadcast channel that permits 
"tuning in" to an ongoing transmission; discussion of such a 
scenario is deferred to Section I VIII 

We now define some basic terminology and notation. Unless 
noted otherwise, all logarithms base 2, all symbols denote 
complex quantities, and all rates are in bits per complex 
symbol (channel use), i.e., b/s/Hz. We use • T for transpose 

'See Section IVlIII for a discussion regarding more general models for a. 

2 An equivalent model would be a broadcast channel in which a single 
encoding of a common message is being sent to a multiplicity of receivers, 
each experiencing a different SNR. 



and • t for Hermitian (conjugate transpose) operators. Vectors 
and matrices are denoted using bold face, random variables 
are denoted using sans-serif fonts, while sample values use 
regular (serif) fonts. 

We define the ceiling rate of the rateless code as the highest 
rate R at which the code can operate, i.e., the effective rate 
if the message is decoded from the single received block yi; 
hence, a message consists of NR information bits. Associated 
with this rate is an SNR threshold, which is the minimum SNR 
required in the realized channel for decoding to be possible 
from this single block. This SNR threshold can equivalently be 
expressed in the form of a channel gain threshold. Similarly, 
if the message is decoded from to > 2 received blocks, 
the corresponding effective code rate is R/m, and there is 
a corresponding SNR (and channel gain) threshold. Thus, for 
a rateless encoding consisting of M blocks, there is a sequence 
of M associated SNR thresholds. 

Finally, as in the introduction, we refer to the code out 
of which our rateless construction is built as the base code, 
and the associated rate of this code as simply the base code 
rate. At points in our analysis we will assume that a good 
base code is used in the code design, i.e., that the base code 
is capacity-achieving for the AWGN channel, and thus has 
the associated properties of such codes. This will allow us to 
distinguish losses due to the code architecture from those due 
to the choice of base code. 

III. Motivating Example 

To develop initial insights, we construct a simple low- 
complexity perfect rateless code that employs two layers of 
coding to support a total of two redundancy blocks. 

We begin by noting that for the case of a rateless code with 
two redundancy blocks the channel gain \a\ may be divided 
into three intervals based on the number of blocks needed for 
decoding. Let a.\ and a 2 denote the two associated channel 
gain thresholds. When \a\ > \a±\ decoding requires only one 
block. When |ai| > \a\ > |a 2 | decoding requires two blocks. 
When |cK2 1 > \ct\ decoding is not possible. The interesting 
cases occur when the gain is as small as possible to permit 
decoding. At these threshold values, for one-block decoding 
the decoder sees 

yi = aixi +zi, (2) 

while for two-block decoding the decoder sees 

yi = a 2 xi +zi, (3) 
y 2 = a 2 x 2 + z 2 . (4) 

In general, given any particular choice of the ceiling rate 
R for the code, we would like the resulting SNR thresholds 
to be a low as possible. To determine lower bounds on these 
thresholds, let 

SNR TO = P\a m \ 2 /a 2 , (5) 
and note that the capacity of the one-block channel is 

/ 1 =log(l + SNRi), (6) 
while for the two-block channel the capacity is 

h = 21og(l + SNR 2 ) (7) 
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bits per channel use. A "channel use" in the second case 
consists of a pair of transmitted symbols, one from each block. 

In turn, since we deliver the same message to the receiver 
for both the one- and two-block cases, the smallest values of 
|cki I and |qt2 | we can hope to achieve occur when 



h=h = R. 



(8) 



Thus, we say that the code is perfect if it is decodable at these 
limits. 

We next impose that the construction be a layered code, and 
that the layers be successively decodable. 

Our layering constraint means that we require the transmit- 
ted blocks to be linear combinations of two base codewords 
Ci G Ci and c 2 G e|| 



Xl 

x 2 



<7llCi + 5l 2 C 2 , 
521 c l + <?22C 2 - 



(9) 
(10) 



Base codebook Ci has rate Ri and base codebook C 2 has 
rate R 2 , where Ri + R% — R, so that total rate of the 
two codebooks equals the ceiling rate. We assume for this 
example that both codebooks are capacity-achieving, so that 
the codeword components are i.i.d. Gaussian. Furthermore, for 
convenience, we scale the codebooks to have unit power, so 
the power constraint instead enters through the constraints 



Iffnl 

If2l| 



|312| 2 = P, 



1522 I 



P. 



(ID 

(12) 



Finally, the successive decoding constraint in our system 
means that the layers are decoded one at a time to keep 
complexity low (on order of the base code complexity). 
Specifically, the decoder first recovers c 2 while treating Ci 
as additive Gaussian noise, then recovers Ci using c 2 as side 
information. 

We now show that perfect rateless codes are possible within 
these constraints by constructing a matrix G = [g m i] so that 
the resulting code satisfies ([HJ. Finding admissible G is simply 
a matter of some algebra: in the one-block case we need 



i?i = J Ql (ci;yi|c 2 ) 
Ri = Iafayi), 
and in the two-block case we need 

Ri = I a2 {ci;yi,y 2 \c 2 ) 
R2 = / Q2 (c 2 ;yi,y 2 ). 



(13) 
(14) 

(15) 
(16) 



The subscripts a.\ and a 2 are a reminder that these mutual 
information expressions depend on the channel gain, and the 
scalar variables denote individual components from the input 
and output vectors. 

While evaluating ([T3ll-([l5l> is straightforward, calculating 
the more complicated ( TToT l. which corresponds to decoding 
c 2 in the two-block case, can be circumvented by a little 
additional insight. In particular, while Ci causes the effec- 
tive noise in the two blocks to be correlated, observe that 

3 In practice, the codebooks Ci and 62 should not be identical, though they 
can for example be derived from a common base codebook via scrambling. 
This point is discussed further in Section IVII-BI 



a capacity-achieving code requires Xi and x 2 to be i.i.d. 
Gaussian. As C\ and c 2 are Gaussian, independent, and equal 
in power by assumption, this occurs only if the rows of G 
are orthogonal. Moreover, the power constraint P ensures that 
these orthogonal rows have the same norm, which implies that 
G is a scaled unitary matrix. 

The unitary constraint has an immediate important conse- 
quence: the per-layer rates R\ and i? 2 must be equal: 



R 1 =R 2 = R/2. 



(17) 



This occurs because the two-block case decomposes into two 
parallel orthogonal channels of equal SNR. We will see in the 
next section that a comparable result holds for any number of 
layers. 

From the definitions of SNRi and I\ [cf. ([5]) and (0], and 
the equality I\—R ||8), we find that 



P\ ai \ 2 /a 2 = 2 R -l. 
Also, from ( fT3l and ( TP71 ). we find that 

\ 9ll \ 2 \ai\ 2 /a 2 =2*^-1 
Combining ( fT8l and ( fT9l yields 

2 R/2 _ 1 p 



I5ii I 



P- 



2 R -1 



2«/ 2 + l' 



(18) 



(19) 



(20) 



The constraint that G be a scaled unitary matrix, together with 
the power constraint P, implies 



1521 1 



I512I 2 = P-|5n| 



15: 



22 



P-\m\ 
\g 
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(21) 
(22) 
(23) 



which completely determines the squared modulus of the 
entries of G. 

Now, the mutual information expressions ([T3"l)-([T6l) are 
unaffected by applying a common complex phase shift to any 
row or column of G, so without loss of generality we take the 
first row and first column of G to be real and positive. For 
G to be a scaled unitary matrix, <? 2 2 must then be real and 
negative. We have thus shown that, if a solution to (fT3l)-(IT6l) 
exists, it must have the form 



G = 



5n 512 
521 522 



P 



2«/ 2 + 1 



1 2*/*' 

2 fl/4 _j 



(24) 



Conversely, it is straightforward to verify that (fT3]>-(fT6]> are 
satisfied with this selection. Thus d24T > characterizes the (es- 
sentially) unique solution G. 

In summary, we have constructed a 2-layer, 2-block perfect 
rateless code from linear combinations of codewords drawn 
from equal-rate codebooks. Moreover, decoding can proceed 
one layer at a time with no loss in performance, provided 
the decoder is cognizant of the correlated noise caused by 
undecoded layers. In the sequel we consider the generalization 
of our construction to an arbitrary number of layers and 
redundancy blocks. 
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312C2 


<?23 c 3 


ff33C3 


513C3 


324C4 


314C4 


534^4 



as (colored) noise, 



time — ► 

Fig. 1 . A rateless code construction with 4 layers and 3 blocks of redundancy. 
Each block is a weighted linear combination of the (./V-element) base 
codewords ci,C2,... ,04, where g m i, the (m, Z)th element of G, denotes 
the weight for layer I of block m. 



IV. Rateless Codes with Layered Encoding and 
Successive Decoding 

The rateless code construction we pursue is as follows 
[7]. First, we choose the range (maximum number M of 
redundancy blocks), the ceiling rate R, the number of layers L, 
and finally the associated codebooks 61, . . . , Gz,. We assume 
a priori that the L base codebooks all have equal rate R/L; 
this assumption turns out to be necessary when constructing 
perfect rateless codes with M — L, and in any case has the 
advantage of allowing the codesbooks for each layer to be 
derived from a single base code. 

Given codewords C; e 6/, I = 1,...,L, the redundancy 
blocks xi, . . . ,Xju take the form 



Xl 




"ci" 




= G 








_ C L_ 



(25) 



where G is an M x L matrix of complex gains and where 
x m for each m and C; for each / are row vectors of length N. 
The power constraint enters by limiting the rows of G to have 
squared norm P and by normalizing the codebooks to have 
unit power. Note that with this notation, the mth row of G 
are the weights used in constructing the mth redundancy block 
from the L codewords^ In the sequel we use g m i to denote the 
(m, Z)th entry of G and G m ; to denote the upper-left m x I 
submatrix of G. 

An example of this layered rateless code structure is de- 
picted in Fig. Q] Each redundancy block contains a repetition 
of the codewords used in the earlier blocks, but with a different 
complex scaling factor. The code structure may therefore be 
viewed as a hybrid of layering and repetition. Note that, absent 
assumptions on the decoder, the order of the layers is not 
important. 

In addition to the layered code structure, there is additional 
decoding structure, namely that the layered code be succes- 
sively decodable. Specifically, to recover the message, we first 

4 The Zth column of G also has a useful interpretation. In particular, one can 
interpret the construction as equivalent to a "virtual" code-division multiple- 
access (CDMA) system with L users, each corresponding to one layer of the 
rateless code. With this interpretation, the signature (spreading) sequence for 
the Zth virtual user is the Zth column of G. 



decode c L , treating G [cj ■ ■ ■ cj / _ 1 \ 

then decode Cl-i, treating G [cj ■ ■ ■ cJJ 1 as noise, 
and so on. Thus, our aim is to select G so that capacity is 
achieved for any number m — 1, . . . , M of redundancy blocks 
subject to the successive decoding constraint. 

Both the layered repetition structure ( l25T > and the suc- 
cessive decoding constraint impact the degree to which we 
can approach a perfect code. Accordingly, we examine the 
consequences of each in turn. 

We begin by examining the implications of the layered 
repetition structure 1251 . When the number of layers L is at 
least as large as the number of redundancy blocks M, such 
layering does not limit code performance. But when L < M, 
it does. In particular, whenever the number m of redundancy 
blocks required by the realized channel exceeds L, there is 
necessarily a gap between the code performance and capacity. 
To see this, observe that (l25T l with (fTJ, restricted to the first 
m blocks, defines a linear L-input m-output AWGN channel, 
the capacity of which is at most 




for m < L, 
for rn > L. 



(26) 



Only for m < L does this match the capacity of a general 
rn-block AWGN channel, viz., 

7 ro =mlog(l + ^-J . (27) 

Ultimately, for rn > L the problem is that an i-fold linear 
combination cannot fill all degrees of freedom afforded by the 
m-block channel. 

An additional penalty occurs when we combine the layered 
repetition structure with the requirement that the code be rate- 
less. Specifically, for M > L, there is no choice of gain matrix 
G that permits l26i l to be met with equality simultaneously for 
all m = 1, . . . , M. A necessary and sufficient condition for 
equality is that the rows of G m L be orthogonal for m < L 
and the columns of G m l be orthogonal for m > L. This 
follows because reaching (l26b for m < L requires that the 
linear combination of L codebooks create an i.i.d. Gaussian 
sequence. In contrast, reaching d26T > for m > L requires that 
the linear combination inject the L codebooks into orthogonal 
subspaces, so that a fraction L/m of the available degrees 
of freedom are occupied by i.i.d. Gaussians (the rest being 
empty). 

Unfortunately, the columns of G m ,L cannot be orthogonal 
simultaneously for all m > L. That would entail the construc- 
tion of orthogonal m-dimensional vectors (with nonzero en- 
tries) that remain orthogonal when truncated to their first m— 1 
dimensions, an obvious impossibility. Thus (l26l i determines 
only a lower bound on the loss due to the layering structure 
(|25| |. Fortunately, the additional loss encountered in practice 
turns out to be quite small, as we demonstrate numerically as 
part of the next section. 

Our lower bound on loss incurred by the use of insufficiently 
many layers is readily obtained by comparing (|26| | and d27] >. 
Specifically, given a choice of ceiling rate R for the rateless 
code, (l26l l implies that for rateless codes constructed using 
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TABLE I 

Losses o4J/|a m | in dB due to layered structure imposed on a 

RATELESS CODE OF CEILING RATE R = 5 B/S/HZ, AS A FUNCTION OF THE 
NUMBER OF LAYERS L AND REDUNDANCY BLOCKS m. 

Redundancy blocks m 







2 


3 


4 


5 


6 


7 


8 


9 


10 


L = 


1 


5.22 


6.77 


7.50 


7.92 


8.20 


8.40 


8.54 


8.65 


8.74 


L = 


2 


0.00 


1.55 


2.28 


2.70 


2.98 


3.17 


3.32 


3.43 


3.52 


L = 


3 


0.00 


0.00 


0.73 


1.16 


1.43 


1.63 


1.77 


1.88 


1.97 


L = 


i 


0.00 


0.00 


0.00 


0.42 


0.70 


0.90 


1.04 


1.15 


1.24 


L = 


5 


0.00 


0.00 


0.00 


0.00 


0.28 


0.47 


0.62 


0.73 


0.82 


L = 


6 


0.00 


0.00 


0.00 


0.00 


0.00 


0.20 


0.34 


0.45 


0.54 


L = 


V 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.14 


0.26 


0.35 


L = 


8 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.11 


0.20 


L = 


9 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.09 



linear combinations of L base codes, the smallest channel gain 
a' m for which it's possible to decode with m blocks is 

, 2 f(2*/™-l)^ for m<L, 
|C * J -\(2*A-1)££ for m>L. ^ 

By comparison, d2/T i implies that without the layering con- 
straint the corresponding channel gain thresholds a m are 



2 R/r 



(J 

T 



(29) 



The resulting performance loss |a4J/|a: m | caused by the 
layered structure as calculated from (f28j) and d29l ) is shown in 
dB in Table U for a target ceiling rate of R = 5 bits/symbol. 
For example, if an application requires M = 10 redundancy 
blocks, a 3-layer code has a loss of less than 2 dB at m = 10, 
while a 5-layer code has a loss of less than 0.82 dB at to = 10. 

As Table U reflects — and as can be readily verified — for a 
fixed number of layers L and a fixed base code rate R/L, 
the performance loss |a4J/|a m | attributable to the imposition 
of layered encoding grows monotonically with the number of 
blocks to, approaching the limit 

KJ! = - 1 (30 ) 
laool 2 {R/L)\n2 K ' 

Thus, in applications where the number of incremental redun- 
dancy blocks is very large, it's advantageous to keep the base 
code rate small. For example, with a base code rate of 1/2 
bit per complex symbol (implemented, for example, using a 
rate- 1/4 binary code) the loss due to layering is at most 0.78 
dB, while with a base code rate of 1 bit per complex symbol 
the loss is at most 1.6 dB. 

We now determine the additional impact the successive 
decoding requirement has on our ability to approach capacity, 
and more generally what constraints it imposes on G. We 
continue to incorporate the power constraint by taking the rate- 
R/L codebooks Cj., . . . , Cz, to have unit power and the rows 
of G to have squared norm P. Since our aim is to employ 
codebooks designed for (non-fading) Gaussian channels, we 
make the further assumption that the codebooks have constant 
power, i.e., that they satisfy the per-symbol energy constraint 
i?[|c; jra (w)| 2 ] < 1 for all layers I and time indices n 
1 Y, where the expectation is taken over equiprobable 



messages w S {1, . . . , 2 NR / L }. Additional constraints on 
G will now follow from the requirement that the mutual 
information accumulated through any block to at each layer I 
be large enough to permit successive decoding. 

Concretely, suppose we have received blocks l,...,m. 
Let the optimal threshold channel gain a m be defined as in 
Section Hill i.e., as the solution to [cf. flTTl il 

J R = TOlog(l + (|a m | 2 / ( 7 2 )P). (31) 

, L have been successfully 



Suppose further that layers I + 1 
decoded, and define 



Vl 



Cl 



Zl 



(32) 



as the received vectors without the contribution from layers 
l + l,...,L. 

Then, following standard arguments, with independent 
equiprobable messages for each layer, the probability of de- 
coding error for layer / can made vanishingly small with 
increasing block length only if the mutual information between 
input and output is at least as large as the rate R/L of the code 
6;. That is, successive decoding requires 



fl/L<(l/JV)I(c;yi,...,y m |cf +1 ) (33) 
= (l//V)7(q; Vl ,..., Vro ) (34) 

det(I+(|a m | 2 /a 2 )G„uG| fiJ ) 



< log 



det(I + (|a m | 2 /cx 2 )G„ M _ 1 G| )U _ 1 ) ' 



(35) 



where I is an appropriately sized (to x to) identity matrix. 
The inequality ((35) relies on the assumption that the code- 
books have constant power, and it holds with equality if the 
components of c%, . . . , cj are i.i.d. Gaussian. 

Our ability to choose G to either exactly or approximately 
satisfy ( f35T > for all I = 1, . . . , L and each to = 1, . . . , M 
determines the degree to which we can approach capacity. It 
is straightforward to see that there is no slack in the problem; 
(1351 1 can be satisfied simultaneously for all I and to only if the 
inequalities are all met with equality. Beyond this observation, 
however, the conditions under which (T3~5l l may be satisfied are 
not obvious. 

Characterizing the set of solutions for G when L = M = 2 
was done in Section [Til] (see (l24l ). Characterizing the set of 
solutions when L = M = 3 requires more work. It is shown 
in Appendix U that, when it exists, a solution G must have the 
form 



G = Vx^l 



Vx + 1 

y / X 3 (x + '. 



y/x 2 (x + 1) 
e je Wx 5 + l 



yV fr + 1) 

jf>2 y/x(x+l 

e 



y/x 2 (x 3 + 1) e je3 s/x(x 3 + 1) e j0 Wx' 3 + l 



where x = 2 fl / 6 and where e^ 9i 



(36) 



= 1 , . . . , 4 are complex 
phasors. The desired phasors — or a proof of nonexistence — 
may be determined from the requirement that G be a scaled 
unitary matrix. Using this observation, it is shown in Ap- 
pendix J] that a solution G exists and is unique (up to complex 
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conjugate) for all R < 3(log(7 + 3a/5) - 1) w 8.33 bits per 
complex symbol, but no choice of phasors results in a unitary 
G for larger values of R. 

For example, using (f36t with R = 6 bits/symbol we find 
that: 



P — 63, ai = 1, 



a 3 



VV21 



G = 



V3 
V36 



where 



1\ = arccos — == , 
2V22 

9 3 = — arctan a/7, 



12 V48 
33V* 1 a/oV* 2 
l8e J ' e3 a/9V" 4 



92 = 27r — arctan 3a/7, 
94 = 7r — arctan a/7/3. 



For M > 3 the algebra becomes daunting, though we 
conjecture that exact solutions and hence perfect rateless codes 
exist for all L = M, for at least some nontrivial values of R. 

For L < M perfect constructions cannot exist. As devel- 
oped earlier in this section, even if we replace the optimum 
threshold channel gains a m defined via OTT ) with suboptimal 
gains determined by the layering bound ([25}, viz., 



R 



m log ^1 + |a "| P j for to < L, 
L log f 1 + f for to > L, 



(37) 



it is still not possible to satisfy d35t . However, one can come 
close to satisfying d35l l in such cases. While the associated 
analysis is nontrivial, such behavior is easily demonstrated 
numerically, which we show as part of the next section. 

V. Numerical Examples 

In this section, we consider numerical constructions both 
for the case L — M and for the case L < M. Specifically, 
we have experimented with numerical optimization methods 
to satisfy d35l ) for up to M = 10 redundancy blocks, using the 
threshold channel gains a m defined via (f3Tb in place of those 
defined via (|3TT > as appropriate when the number of blocks M 
exceeds the number of layers L. 

For the case L = M, for each of M — 2, 3, . . . , 10, we 
found constructions with R/L = 2 bits/symbol that come 
within 0.1% of satisfying (l35l l subject to OTb . and often the 
solutions come within 0.01%. This provides powerful evidence 
that perfect rateless codes exist for a wide range of parameter 
choices. 

For the case L < M, despite the fact that there do not 
exist perfect codes, in most cases of interest one can come 
remarkably close to satisfying (|35l l subject to (f37T >- Evidently 
mutual information for Gaussian channels is quite insensitive 
to modest deviations of the noise covariance away from a 
scaled identity matrix. 

As an example, Table HI] shows the rate shortfall in meeting 
the mutual information constraints (l35l l for an L = 3 layer 
code with M = 10 redundancy blocks, and a target ceiling 



TABLE II 

Percent shortfall in rate for a numerically-optimized 
rateless code with m = 10 blocks, l = 3 layers, and a ceiling 
rate of r = 5 b/s/hz. 

Redundancy blocks m 
123456789 10 

I = 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 

I = 2 0.00 0.28 1.23 1.46 1.39 0.44 0.59 0.48 0.16 0.23 

I = 3 0.00 0.29 1.23 1.48 1.40 0.43 0.54 0.51 0.15 0.23 



rate R = 5. The associated complex gain matrix is 



G 



1.4747 
3.5075 
4.0648 
3.2146 
3.2146 
3.2146 
3.2146 
3.2146 
3.2146 
3.2146 



2.6277 

3.7794 ej-2.0810 

3.1298 
3.1322 
3.3328 
3.1049 
3.3248 
3.0980 
3.2880 
3.1795 



e -i0.9531 
e j3.0765 
e -jl.6547 
e j0.9409 
e jl.2506 
g-jl.4196 
e -j2.9449 
gjO.7839 



4.6819 

2.1009 e- iX - 9486 
2.1637 e^ 25732 
3.2949 e^ 09132 
3.0918 e--* 1 - 4248 
3.3206 e^ 28982 
3.1004 e -J'0.2027 

3.3270 e^' 1 - 9403 
3.1394 e-^' 19243 
3.2492 e^ 3413 



The worst case loss is less than 1.5%; this example is typical 
in its efficiency. 

The total loss of the designed code relative to a perfect 
rateless code is, of course, the sum of the successive decoding 
and layered encoding constraint losses. Hence, the losses in 
Table [TT] and Table U are cumulative. As a practical matter, 
however, when L < M, the layered encoding constraint loss 
dwarfs that due to the successive decoding constraint: the 
overall performance loss arises almost entirely from the code's 
inability to occupy all available degrees of freedom in the 
channel. Thus, this overall loss can be estimated quite closely 
by comparing d27l i and (l26l l. or, equivalently, OH and ( |37| ). 
Indeed this is reflected in our example, where the loss of 
Table U dominates over that of Table HI] 

VI. Existence of Near-Perfect Rateless Codes 

While the closed-form construction of perfect rateless codes 
subject to layered encoding and successive decoding becomes 
more challenging with increasing code range M, the contrac- 
tion of codes that are at least nearly perfect is comparatively 
straightforward. In the preceding section, we demonstrated 
this numerically. In this section, we prove this analytically. In 
particular, we construct rateless codes that are arbitrarily close 
to perfect in an appropriate sense, provided enough layers 
are used. We term these near-perfect rateless codes. The code 
construction we present will be applicable to arbitrarily large 
M and will also allow for simpler decoding than the MMSE 
decoder employed in the preceding development. 

A. Encoding 

Our near-perfect rateless code construction [5] is a slight 
generalization of that used in Section [TV] Specifically, as 
(|25| | indicates, in our approach to perfect constructions we 
made each redundancy block a linear combination of the base 
codewords, where the weights are the corresponding row of 
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the combining matrix G. This means that each individual 
symbol of a particular redundancy block is, therefore, a linear 
combination of the corresponding symbols in the respective 
base codewords, with the combining matrix being the same 
for all such symbols. 

By contrast, in this section, we allow the combining matrix 
to vary from symbol to symbol in the construction of each 
redundancy block, and use the additional degrees of freedom 
in the code design to simplify the analysis — at the expense of 
some slightly more cumbersome notation. In particular, using 
q(n) and x m (n) to denote the nth elements of codeword C; 
and redundancy block x m , respectively, we have [cf. ( |25| >1 



, n = l,2,...,N. (38) 



The value of M plays no role in our development and may 
be taken arbitrarily large. Moreover, as before, the power 
constraint enters by limiting the rows of G(n) to have a 
squared norm P and by normalizing the codebooks to have 
unit power. 

It suffices to restrict our attention to G(n) of the form 



'xi(n) " 




'ci(n)" 




= G(n) 




_x M {n)_ 




c L (n)_ 



G(n) = P0D(n) 



(39) 



where P is an M x L (deterministic) power allocation matrix 
with entries J , p m .i that do not vary within a block, 



(40) 



_y/PMj ■ ■ ■ y/PALL_ 

and D(n) is a (random) phase-only "dither" matrix of the form 

cfi,i(n) ••■ d 1<L (n) 



D(n) 



3M,1 



(n) 



J ALL 



(n) 



(41) 



with denoting elementwise multiplication. In our analysis, 
the dij(n) are all i.i.d. in i, j, and n, and are drawn in- 
dependently of all other random variables, including noises, 
messages, and codebooks. As we shall see below, the role of 
the dither is to decorrelate pairs of random variables, hence 
it sufficies for dij(n) to take values +1 and —1 with equal 
probability. 



B. Decoding 

To obtain a near-perfect rateless code, it will be sufficient 
to employ a successive cancellation decoder with maximal 
ratio combining (MRC) of the redundancy blocks. While, in 
principle, an MMSE-based successive cancellation decoder 
enables higher performance, as we will see, an MRC-based 
one is sufficent for our purposes, and simplifies the analysis. 
Indeed, although the encoding we choose creates a per- 
layer channel that is time-varying, the MRC-based successive 
cancellation decoder effectively transforms the channel back 
into a time-invariant one, for which any of the traditional 



low-complexity capacity-approaching codes for the AWGN 
channel are suitable as a base code in the designjf] 

The decoder operation is as follows, assuming the SNR is 
such that decoding is possible from to redundancy blocks. To 
decode the Lth (top) layer, the dithering is first removed from 
the received waveform by multiplying by the conjugate dither 
sequence for that layer. Then, the m blocks are combined 
into a single block via the appropriate MRC for that layer. 
The message in this Lth layer is then decoded, treating the 
undecoded layers as noise, and its contribution subtracted 
from the received waveform. The L — 1st layer is now the 
top layer, and the process is repeated, until all layers have 
been decoded. Note that the use of MRC in decoding is 
equivalent to treating the undecoded layers as white (rather 
than structured) noise, which is the natural approach when the 
dither sequence structure in those undecoded (lower) layers is 
ignored in decoding the current layer of interest. 

We now introduce notation that allows the operation of the 
decoder to be expressed more precisely. We then determine 
the effective SNR seen by the decoder at each layer of each 
rendundancy block. 

Since G(n) is drawn i.i.d., the overall channel is i.i.d., and 
thus we may express the channel model in terms of an arbitrary 
individual element in the block. Specifically, our received 
waveform can be expressed as [cf. (Q]) and ( 1251 )1 







Cl 




Z\ 




= ot m G 




+ 








. C L_ 




_ZM_ 





Cl 




Zl 






+ 






Q 







where G = P©D, with G denoting the arbitrary element in the 
sequence G(n), and where y m is the corresponding received 
symbol from redundancy block m (and similarly for c m , z m , 
D). 

If layers 1 + 1,1 + 2, ... ,L have been successively decoded 
from to redundancy blocks, and their effects subtracted from 
the received waveform, the residual waveform is denoted by 



(42) 



where we continue to let G m j denote the m x / upper-left 
submatrix of G, and likewise for D m> ; and P m ,;. As additional 
notation, we let g m ^ denote the to- vector formed from the 
upper to rows of the Zth column of G, whence 

G m ,z = [g m ,l gm,2 ' ' ' Sm,l] , (43) 

and likewise for d m j and p m ,;. 

With such notation, the decoding can be expressed as 
follows. Starting with \i m .L — y, decoding proceeds. After 
layers / + 1 and higher have been decoded and removed, we 
decode from v m Writing 



(dm,/ © Pmj) 



(44) 



5 More generally, the MRC-based decoder is particularly attractive for 
practical implementation. Indeed, as each repetition block arrives a sufficient 
statistic for decoding can be accumulated without the need to retain earlier 
repetitions in buffers. The computational cost of decoding thus grows linearly 
with block length while the memory requirements do not grow at all. This is 
much less complex than the MMSE decoder used in Section IWl 



s 



the operation of removing the dither can be expressed as 



where 



.,l-l 



d m,l©V!-l- 



(45) 



(46) 



The MRC decoder treats the dither in the same manner as 
noise, i.e., as a random process with known statistics but 
unknown realization. Because the entries of the dither matrix 
are chosen to be i.i.d. random phases independent of the 
messages, the entries of D rn ,; and \c\ ■ ■ ■ Q_i] are jointly 
and individually uncorrelated, and the effective noise \i' m l _ 1 
seen by the MRC decoder has diagonal covariance K v < = 

The effective SNR at which this kh layer is decoded from 
m blocks via MRC is thus 



V SNR mV (a m ), 



m' — l 



where 
SNR TO /,(a m ) 



\ct m \ 2 Pm',l 



(47) 



(48) 



\a m \ 2 {Pm'.l H \-Pm',l-\) +V 2 ' 

Note that we have made the dependency of these per-layer 
per-block SNRs on a m explicit in the notation. 

C. Efficiency 

The use of random dither at the encoder and MRC at the 
decoder both cause some loss in performance relative to the 
perfect rateless codes presented earlier. In this section we show 
that these losses can be made small. 

When a coding scheme is not perfect, its efficiency quantifies 
how close the scheme is to perfect. There are ultimately several 
ways one could measure efficiency that are potentially useful 
for engineering design. Among these, we choose the following 
efficiency notion: 

1) We find the ideal thresholds {a m } for a perfect code of 
rate R. 

2) We determine the highest rate R' such that an imperfect 
code designed at rate R' is decodable with m redun- 
dancy blocks when the channel gain is a m , for all 

777 = 1,2,.... 

3) We measure efficiency 77 by the ratio R' /R, which is 
always less than unity. 

With this notion of efficiency, we further define a coding 
scheme as near-perfect if the efficiency so-defined approaches 
unity when sufficiently many layers L are employed. 

The efficiency of our scheme ultimately depends on the 
choice of our power allocation matrix d40l >. We now show 
the main result of this section: provided there exists a power 
allocation matrix such that for each I and m 



R 



- = J2 log(l + SNR m ,, ; (a m )), 



(49) 



with SNR m : i(a m ) as defined in (|48l >, a near-perfect rateless 
coding scheme results. The existence of such a power al- 
location, as well as an interpretation of j49l . is proved in 
Appendix [II] 



We establish our main result by finding a lower bound on 
the average mutual information between the input and output 
of the channel. Upon receiving m blocks with channel gain 
a m , and assuming layers l+l, . . . , L are successfully decoded, 
let I' ; be the mutual information between the input to the 
kh layer and the channel output. Then 

I' l m = I(q;v„u I d TO ,i) (50) 

= I(ci;a m p m .iQ + \i' m> i_ x I d m< i), (51) 

> J(q; a m p mj iq + \i' m l ), (52) 

> 7(q; a m Pm,iQ + v^ z ), (53) 



log 1 



m' — l 



SNR m ,,z(a m ) 



(54) 



where ( 1511 ) follows from (|45l>— d46l>, ( |52l follows from the 
independence of q and d m /, and ( |53l obtains by replacing 



with a Gaussian random vector v' 



of covariance 



K v ' . Lastly, to obtain d54l we have used d47b for the post- 
MRC SNR. 

Now, if the assumption d49l ) is satisfied, then the right-hand 
side of d54l > is further bounded for all m by 



I' , >log l + ln2 



(55) 



where we have applied the inequality ln(l + u) < u 
(valid for u > 0) to (|49j to conclude that (hx2)R/L < 
Y^m'=i SNR m ' i ;(a m ). Note that the lower bound ( T55b may 
be quite loose; for example, I' l = R/L when m = 1. 

Thus, if we design each layer of the code for a base code 
rate of 

^=log(l + ln2£), (56) 

(|55| > ensures decodability after 777 blocks are received when 

the channel gain is a m , for m = 1, 2, 

Finally, rewriting (|56*T l as 

R _ 2 R "/ L - 1 

L ~ In 2 ' 
the efficiency 77 of the conservatively-designed layered repeti- 
tion code is bounded by 



(57) 



R" (\n2)R"/L \n2R" 
77 > — = — > 1 

V ~ R 2 r "/l _ 1 - 2 L ' 



(58) 



which approaches unity as L — * 00 as claimed. 

In Fig. |2j the efficiency bounds d58l ) are plotted as a function 
of the base code rate R" / L. As a practical matter, our bound 
implies, for instance, that to obtain 90% efficiency requires a 
base code of rate of roughly 1/3 bits per complex symbol. 
Note, too, that when the number of layers is sufficiently large 
that the SNR per layer is low, a binary code may be used 
instead of a Gaussian codebook, which may be convenient for 
implementation. For example, a code with rate 1/3 bits per 
complex symbol may be implemented using a rate- 1/6 LDPC 
code with binary antipodal signaling. 

It thus remains only to show that there exists a power 
allocation such that d49l is satisfied, which is established in 
the Appendix. 
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base code rate (b/s/Hz) 

Fig. 2. Lower bound on efficiency of the near-perfect rateless code. The top 
and bottom curves are the middle and right-hand bounds of {58), respectively. 

VII. Design and Implementation Issues 

In this section, we comment on some additional issues that 
arise in the development and implementation of such codes. 

A. Increasing Code Resolution 

With an ideal rateless code, every prefix of the code is 
a capacity-achieving code. This corresponds to a maximally 
dense set of SNR thresholds at which decoding can occur. By 
contrast, our focus in the preceding sections was on rateless 
codes that were capacity-achieving only for prefixes whose 
lengths are an integer multiple of the base block length. The 
associated sparseness of SNR thresholds can be undesirable 
in some applications, since when the realized SNR is between 
thresholds, capacity is no longer achieved: the realized rate 
promised by the construction is that corresponding to the next 
lower SNR threshold. 

On the other hand, performance may be much better than 
this pessimistic assesment. Attempting to decode when a 
partial redundancy block is available will cause the decoder 
for the base code to see a time-varying channel in which 
the symbols bolstered by the partial block have better SNR 
than the others. Whether the base code and decoder can be 
adapted to operate efficiently in this situation depends on the 
details of their construction, and may be hard to predict. Their 
performance is, however, easily assessed via simulation. 

Another approach to controlling this aspect of our rateless 
code behavior is as follows. Suppose we are interested in 
a rateless code whose ceiling rate is R. Then we use the 
rateless construction of the preceding section to design a 
code of ceiling rate kR, where 1 < k < M, and have 
the decoder collect at least k blocks before attempting to 
decode. With this approach, the associated rate thresholds are 
i?, Rk/(k + 1), Rk/(k + 2), . . . , Rk/M, where we note that 
the largest rate increment is the first, corresponding to the 
factor k/(k + 1). Hence, by choosing larger values of k, one 
can increase the density of rate (and thus SNR) thresholds. 



It should be stressed, however, that there is a price to be 
paid with this approach. In particular, if we keep constant the 
number of codeword symbols that must be accumulated before 
decoding at rate R is possible, then the underlying block size 
in our rateless construction must decrease inversely with k. 
Thus, for sufficiently large k the basic block length becomes 
short enough that code performance suffers, and so in practice 
the selection of k involves a compromise. In addition, this 
approach may also increase requirements on the analog-to- 
digital conversion precision at the receiver front end. 

B. Implementation Comments 

A few additional aspects of implementation are worthy of 
comment. 

First, one consequence of our development of perfect 
rateless codes for M = L is that all layers must have 
the same rate R/L. This does not seem to be a serious 
limitation, as it allows a single base codebook to serve as the 
template for all layers, which in turn generally decreases the 
implementation complexity of the encoder and decoder. The 
codebooks Gi, . . . ,Gl used for the L layers should not be 
identical, however, for otherwise a naive successive decoder 
might inadvertantly swap messages from two layers or face 
other difficulties that increase the probability of decoding error. 
A simple cure to this problem is to apply pseudorandom 
phase scrambling to a single base codebook 6 to generate 
the different codebooks needed for each layer. Pseudorandom 
interleaving would have a similar effect. 

Second, a layered code designed with the successive de- 
coding constraint d35l > can be decoded in a variety of ways. 
Because the undecoded layers act as colored noise, an optimal 
decoder should take this into account, for example by using 
a minimum mean-square error (MMSE) combiner on the 
received blocks {y m }. The MMSE combining weights will 
change as each layer is stripped off. Alternatively, some or 
all of the layers could be decoded jointly; this might make 
sense when the decoder for the base codebook decoder is 
already iterative, and could potentially accelerate convergence 
compared to a decoder that treats the layers sequentially. 

Finally, a comparatively simple receiver is possible when 
all M blocks have been received from a perfect rateless 
code in which M = L. In this special case the linear 
combinations applied to the layers are orthogonal, hence 
the optimal receiver can decode each layer independently, 
without successive decoding. This property is advantageous 
in a multicasting scenario because it allows the introduction 
of users with simplified receivers that function only at certain 
rates, in this case the lowest supported one. 

Some further design and implementation issues are ad- 
dressed in [21]. 

VIII. Concluding Remarks 

There are a variety of interesting directions for further 
research. For example, one obvious area of future work is 
to incorporate time variation into the channel model (|T). The 
rateless constructions presented in this paper are designed to 
operate efficiently when, e.g., for one block the channel gain is 
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[ai], for two blocks the gains are [a.2 02]. for three blocks the 
gains are [0:3 03 0:3], and so on. A simple extension would 
allow a to vary deterministically so long as the pattern of 
variation is known in advance. Then, for one block the code 
would be designed for a gain of [01,1], for two blocks the 
target gains would be [0:2,1 0:2,2]. f° r three blocks the gains 
would be [0:3,1 03,2 03,3], and so on. More generally, however, 
the design of perfect layered rateless codes when a follows a 
stochastic model remains an important open problem. 

Other worthwhile directions include more fully developing 
rateless constructions for the AWGN channel that allow de- 
coding to begin at any received block, and/or to exploit an 
arbitrary subset of the subsequent blocks. Initial efforts in this 
direction include the faster-than-Nyquist constructions in [5], 
[20], and the diagonal subblock layering approach described 
in [20]. 

Beyond the single-input, single-output (SISO) channel, 
many multiterminal and multiuser extensions are also of 
considerable interest. Examples of preliminary developments 
along these lines include the rateless space-time code construc- 
tions in [6], the rateless codes for multiple-access channels 
developed in [16], and the approaches to rateless coding for 
parallel channels examined in [20]. Indeed, such research may 
lead to efficient rateless orthogonal frequency-division mul- 
tiplexing (OFDM) systems and efficient rateless multi-input, 
multi-output (MIMO) codes with wide-ranging applications. 

Finally, extending the layered approach to rateless coding 
developed in this paper beyond the Gaussian channel is also 
a potentially rich direction for further research. A notable 
example would be the binary symmetric channel, where good 
rateless solutions remain elusive. 

Appendix I 
Perfect rateless solution for L = M = 3 



Determining the set of solutions 



G 



311 512 313 

321 322 323 

331 332 333 



(59) 



to (135t when L = M = 3 as a function of the ceiling rate R 
is a matter of lengthy if routine algebra. 

We begin by observing that any row or any column of G 
may be multiplied by a common phasor without changing 
GG*. Without loss of generality we may therefore take the 
first row and first column of G to be real. Each G thus 
represents a set of solutions D1GD2, where Di and D2 are 
diagonal matrices in which the diagonal entries have modulus 
1 . The solutions in the set are equivalent for most engineering 
purposes and we shall therefore not distinguish them further. 

We know that G must be a scaled unitary matrix, scaled so 
that the row and column norms are \/~P, Thus, if we somehow 
determine the first two rows of G, there is always a choice for 
the third row: it's the unique vector orthogonal to the first two 
rows which meets the power constraint and which has first 
component real and positive. Conversely, it's easy to see that 
any appropriately scaled unitary matrix G that satisfies d35l i 
for m = 1 and m = 2 (and all / = 1,2,3) necessarily satisfies 
( l35l l for m = 3. We may therefore without loss of generality 



restrict our attention to determining the set of solutions to the 
first two rows of G; the third row comes "for free" from the 
constraint that G be a scaled unitary matrix. 

Assume, again without loss of generality, that |oi| 2 = 1 



and a 2 = 1. Via (l35l l. the first row of G (which controls the 
first redundancy block) must satisfy 

3ii) 



31! 



R/3 = log(l 
2R/3 = log(l 
3R/3 = log(l + 3n 

together with the power constraint 

P 



9 2 n 



9l2 



3? 2 ) 
3i2 



3?3- 



3i 2 3 ) 



(60) 
(61) 
(62) 

(63) 



Thus 



and 





P = 2 R - 1 = x 6 


- 1 




9 2 u 


= 2 R / 3 -l=x 2 -l, 






2 

3l2 


= 2 R/3( 2 R/3_ 1)=X 


V- 


-1), 


3?3 


= 2 2fl/3 (2 i?./3_ 1)=a .4 (a ,2 


-1), 



(64) 
(65) 
(66) 



where for convenience we have introduced the change of 
variables x = 2 R / 6 . 

The first column of G (which controls the first layer of 
each redundancy block) is also straightforward. Via ( |3~T1 i with 

m = 2 and m = 3, we have 



1 



o 2 



l«3| 



x 4 + x 2 + 1 
1 and m — 2 yields 

|2/„2 



Using ([35]) for I 

fl/3 = log(l + |o 2 | 2 (3^+32i))- 



(67) 
(68) 

(69) 



Substituting the previously computed expressions d64b and 
d67b for g\ x and |o2| 2 into d69l and solving for g<xy yields 



(70) 



321 



x 3 {x 2 - 1). 



To solve for the second row of G we use d35l l with m = 
I = 2 together with the requirement that the first and second 
rows be orthogonal. It is useful at this stage to switch to polar 
coordinates, i.e., 522 = \922\e? dl and g 2 3 = \g23\e j02 - 

Orthogonality of the first and second rows means that 



= 311321 + 3i2|322k 



3i3 1 323 1 e 



j02 



(71) 



Complex conjugation is not needed here because the first row 
is real. Substituting the quantities d64li-(l66l) and ( l70l i into ( fTTI ). 
using the power constraint 



1323 I 



P 



1322 I 



3ll, 



and dividing through by x\J x 2 — 1 yields 



0= ^x(x 2 -l) + \g 22 \& 



+ Xy^ 



X 3 ~ 1 - 1 522 1 



(72) 



(73) 
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By isolating the rightmost of the three terms in the above 
equation and taking the squared modulus of both sides, we 
find that 



x(x 2 - 1) + \g 22 \ 2 + 2cos8 1 \g 22 \^x(x 2 - 1) 

= x 2 (x 6 -x 5 +x 3 -l~\g 22 \ 2 ), (74) 

so that 



2 cos 6>i \g 22 1 



x 5 — x 3 



x 2 + x - (1 + x 2 )\g 22 \ 2 



\J x(x 2 — 1) 



(75) 



This ungainly expression will allow us to eliminate 8\ from a 
subsequent equation. 

To complete the calculation of the second row of G, we 
use ( f35T > with m = 2 and I = 1, 2 to infer that 



22i?./3 



det(I + |a 2 rG 2l aG5 2 ) 



(76) 



To expand the right hand side of ( 17 61 , we compute 

^2,2G 2 2 

x 4 - 1 (x 2 - l)x 3 / 2 + xVP^Ilg^le-^ 1 



(*) 



\g 22 \ 2 + x 3 (x 2 -I) 



(77) 



where (*) is the complex conjugate of the upper right entry, 
from which we find 

+ \g 22 \ 2 + 2 cos 0x1322 1 {x - l)^x(x 2 - 1)). (78) 



The term 2cos9i\g 22 \ in ( 1781 matches the left hand side 
of d75l l. so by combining d75l l. d76l l. and d78l . solving for 
\g 22 1 2 , and simplifying terms, we arrive at 



\g 22 \ 2 = (x 5 + l)(x-l). 



The power constraint d72b then immediately yields 



1523 | 



x(x 2 



1) 



(79) 



(80) 



The squared modulus of the entries of the last row of G 
follow immediately from the norm constraint on the columns: 



5si = p- aii + all = x 2 (x 2 -x + \){x 2 - l). 



I332I 2 =P- g\ 2 - g\ 2 = x(x 3 + \)(x - 1) 



and 



l333| 2 = P-.g 2 2 3 -5?3 = (^ + l)(z-l)- 



(81) 



(82) 



(83) 



This completes the calculation of the squared modulus of the 
entries of G. In summary, we have shown that G has the form 



G = Vx~ 1- 

VxTT y/ X 2 {x + l) y/x*{x+\) 



y /x 3 {x + l) e^ 1 Vx 5 + 1 eJ d2 yfxjx + 1) 
_^/x 2 (x 3 + 1) e^ 3 v / x{x 3 + 1) etO'y/sFTT 

where x = 2 R / 6 . 



(84) 



We must now establish the existence of suitable 0\,...,Q±. 
To resolve this question it sufficies to consider the conse- 
quences of the orthogonality constraint ( TtTT > on 9i and 9 2 . 
As remarked at the start of this section, the last row of G and 
hence #3 and 84 come "for free" once we have the first two 
rows of G. 

Substituting the expressions for |g m z| 2 determined above 
into (|7TT i and canceling common terms yields 







1 



(85) 



The right-hand side is a sum of three phasors of predetermined 
magnitude, two of which can be freely adjusted in phase. In 
geometric terms, the equation has a solution if we can arrange 
the three complex phasors into a triangle, which is possible if 
and only if the longest side of the triangle is no longer than the 
sum of the lengths of the shorter sides. The resulting triangle 
is unique (up to complex conjugation of all the phasors). Now, 
the middle term of (ISBT l grows faster in x than the others, so 
for large x we cannot possibly construct the desired triangle. 
A necessary condition for a solution is thus 



1, 



(86) 



where equality can be shown (after some manipulation) to hold 
at the largest root of x 2 — x + 1, i.e., at x = (3 + \/5)/2, or 
equivalently R = 61og 2 x = 61og 2 (3 + V5) — 6. It becomes 
evident by numerically plotting the quantities involved that this 
necessary condition is also sufficient, i.e., a unique solution to 
(ISBI l exist for all values of x in the range 1 < x < (3 + 
\/5)/2 and no others. Establishing this fact algebraically is an 
unrewarding though straightforward exercise. 

A relatively compact formula for 8\ may be found by 
applying the law of cosines to 



C0S(7T — 81) 



1 



2y / x(x 4 - X 3 + X 2 - X + 1) 

Similar formulas may be derived for 8 2 , 83, and 84. 



(87) 



Appendix II 
Power Allocation 

The power allocation satisfying the property d49l can be 
obtained as the solution to a different but closely related rate- 
less code optimization problem. Specifically, let us retain the 
block structuring and layering of the code of Section [VI- A| but 
instead of using repetition and dithering in the construction, 
let us consider a code where the codebooks in a given layer 
are independent from block to block. While such a code is still 
successively decodable, it does not retain other characteristics 
that make decoding possible with low complexity. However, 
the complexity characteristic is not of interest. What does 
matter to us is that the per-layer, per-block SNRs that result 
from a particular power allocation will be identical to those 
of the code of Section IVI-AI for the same power allocation. 
Thus, in tailoring our code in this Appendix to meet d49l ), we 
simultaneously ensure our code of Section IVI-AI will as well. 

We begin by recalling a useful property of layered codes in 
general that we will apply. Consider an AWGN channel with 
gain a and noise rvsz of variance a 2 , and consider an L-layer 
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block code that is successively decodable. If the constituent 
codes are capacity-achieving i.i.d. Gaussian codes, and MMSE 
successive cancellation is used, then the overall code will be 
capacity achieving. More specifically, for any choice of powers 
pi for layers I = 1,2, ... ,L that sum to the power constraint 
P, the associated rates // for these layers will sum to the 
corresponding capacity log(l + \a\ 2 P/a 2 ). Equivalently, for 
any choice of rates that sum to capacity, the associated 
powers pi will sum to the corresponding power constraint. In 
this latter case, any rate allocation that yield powers that are 
all nonnegative is a valid one. 

To see this, let the relevant codebooks for the layers be 
61, ... , Ci, and let the overall codeword be denoted 

c = c 1 + --- + c L , (88) 

where the q e 6; are independently selected codewords drawn 
for each layer. The overall code rate is the sum of the rates 
of the individual codes. The overall power of the code is P = 

Pi H 1" PL- 

From the mutual information decomposition 

L 

I(c;y)=J2 I i ( 89 ) 

1=1 

where 

h = J(q; cx H \-c L +z \ 

with ch 1 = (q+i, Q+2, ■ ■ ■ , Cl), we see that the overall code- 
book power constraint P can be met by apportioning power 
to layers in any way desired, so long as px + ■ ■ ■ + pl = P- 
Since the undecoded layers are treated as noise, the maximum 
codebook rate for the kh layer is then 

It = log(l+SNR;) (90) 

where 



\a\ 2 Pl + \a\ 2 p 2 + ■ ■ ■ + \a\ 2 pi- x + a 2 

is the effective SNR when decoding the kh layer. Straightfor- 
ward algebra, which amounts to a special-case recalculation 

of ([89), confirms that h H V I L = log(l + \a\ 2 P/a 2 ) for 

any selection of powers {pi}. 

Alternatively, instead of selecting per-layer powers and 
computing corresponding rates, one can select per-layer rates 
and compute the corresponding powers. The rates {/;} for 
each level may be set in any way desired so long as the 

total rate Ix -\ h II does not exceed the channel capacity 

log(l + \a\ 2 P/<j 2 ). The required powers {pi} may then be 
found using (l90l and ( f9Tb recursively for I — 1, .... L. There 
is no need to verify the power constraint: it follows from 
(|89l that the powers computed in this way sum to P. Thus 
it remains only to check that the {pi} are all nonnegative to 
ensure that the rate allocation is a valid one. 

We now apply this insight to our rateless context. The target 
ceiling rate for our rateless code is R, and, as before, a m , 
m = 1,2,..., denotes the threshold channel gains as obtained 
via (ED- 



Comparing d49l with d90l > and d9TT > reveals that d49l can be 
rewritten as 

m 
m' — 1 

for all / = 1, 2, . . . , L and m = 1, 2, . . ., where 

Ri = R/L (93) 

and I m '.i(a m ) is the mutual information in layer I from block 
m' when the realized channel gain is a m . Thus, meeting d49l is 
equivalent to finding powers p m i,i for each code block m! and 
layer / so that for the given rate allocation Ri (a) the powers 
are nonnegative, (b) the power constraint is met, and (c) when 
the channel gain is a m , the mutual information accumulated 
at the kh layer after receiving code blocks 1,2, ... ,m equals 
Ri. 

Since the power constraint is automatically satisfied by any 
assignment of powers that achieves the target rates, it suffices 
to establish that d92l have a solution with nonnegative per- 
layer powers. 

The solution exists and is unique, as can be established by 
induction on rn. Specifically, for m = 1 the rateless code is 
an ordinary layered code and the powers pi.i, ■ ■ ■ ,Pi.l may 
be computed recursively from [cf. j92l l 

m 

Rl=Yl M 1 + SNR mV (a ro )), (94) 

m' — l 

with SNR m .z(a m ) as given in (l48l for I = 1, . . . , L. 

For the induction hypothesis, assume we have a power 
assignment for the first m blocks that satisfies (194-b . To find the 
power assignment for the (m + l)st block, observe that when 
the channel gain decreases from a m to a m +x tne per-layer 
mutual information of every block decreases. A nonnegative 
power must be assigned to every layer in the (rn + l)st code 
block to compensate for the shortfall. 

The mutual information shortfall in the kh layer is 

m 

= Ri- + SNR mV (a m+ i)), (95) 

m' — l 

and the power p m +x,l needed to make up for this shortfall is 
the solution to 

A m+1 . ; = log(l + SNR m+M (a ro+ i)), (96) 

viz., 

Pm+X,l = (2 2A ™+ 1 '' - 1) 

■ (Pm+1,1 H 1- Pm+u-i + 1 m+ l9 )' (97) 

\a m +i\ 

This completes the induction. Perhaps counter to intuition, 
even if the per-layer rates R\ , . . . , Rl are set equal, the per- 
layer shortfalls A m+ i i, . . . , A m+ i L will not be equal. Thus, 
within a layer the effective SNR and mutual information will 
vary from block to block. 

Eqs. d95l ) and Wh are easily evaluated numerically. An 
example is given in Table IIIlFI 

6 If one were aiming to use a rateless code of the type described in 
Section I VII in practice, in calculating a power allocation one should take 
into account the gap to capacity of the particular base code being used. This 
optimization is developed in [21]. 
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TABLE III 

PER-LAYER POWER ASSIGNMENTS p mj ; AND CHANNEL GAIN 
THRESHOLDS <X m FOR THE INITIAL BLOCKS OF AN L = 4 LAYER 
RATELESS CODE WITH TOTAL POWER P = 255, NOISE VARIANCE <T 2 = 1, 
AND PER-LAYER RATE R/L = 1 B/S/HZ. 





m = 1 


m = 2 


m = 3 


m = 4 


m = 5 


gain (dB) 


0.00 


-12.30 


-16.78 


-19.29 


-20.99 


I = 1 


3.00 


40.80 


48.98 


55.77 


58.79 


I = 2 


12.00 


86.70 


61.21 


60.58 


61.65 


I = 3 


48.00 


86.70 


81.32 


71.48 


67.50 


I = 4 


192.00 


40.80 


63.48 


67.16 


67.06 



Finally, since this result holds regardless of the choice of 
the constituent Ri, it will hold for the particular choice d93l >. 
whence ( |49b . 
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